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EXPLOITING  STRUCTURAL  SYMMETRY 
IN  A  SPARSE  PARTIAL  PIVOTING  CODE* 

STANLEY  C.  EISENSTAT*  AND  JOSEPH  W  H.  LIU* 

Abstract.  This  short  communication  shows  how  to  take  advantage  of  structural  symmetry 
to  improve  the  performance  of  a  class  of  partial  pivoting  codes  for  the  LU  factorization  of  large 
sparse  unsymmetric  matrices.  Experimental  results  demonstrate  the  effectiveness  of  this  technique 
in  reducing  the  overall  factorization  time. 

Key  words,  sparse  LU  factorization,  partial  pivoting,  structural  symmetry 

AMS(MOS)  subject  classifications.  65F05.  65F50 

1.  Introduction.  Many  implementations  of  sparse  LU  factorization  with  par¬ 
tial  pivoting  compute  the  factors  one  row  or  column  at  a  time.  Each  step  involves 
both  symbolic  operations  (to  determine  the  nonzero  structure!  and  numeric  opera¬ 
tions.  With  the  development  of  fast  boating-point  hardware  and  vector  processors, 
the  symbolic  operations  have  come  to  represent  a  nontrivial  fraction  of  the  overall 
factorization  time.  Thus  any  sizable  reduction  in  this  symbolic  overhead  would  have 
a  significant  impact. 

The  technique  of  symmetric  reduction  (4j  exploits  structural  symmetry  to  decrease 
the  amount  of  structural  information  required  for  the  symbolic  factorization  of  a  sparse 
unsymmetric  matrix  (i.e..  for  obtaining  the  nonzero  structures  of  the  factor  matrices). 
This  has  the  practical  advantage  of  decreasing  the  run-time. 

In  this  short  communication,  we  show  how  to  use  symmetric  reduction  to  im¬ 
prove  the  performance  of  a  class  of  partial  pivoting  codes  for  the  LU  factorization  of 
large  sparse  unsymmetric  matrices,  in  particular.  Sherman's  NSPFAC  (a  more  recent 
version  of  NSPIV  [8])  and  a  code  of  Gilbert  and  Peierls  [7j.  For  some  problems  the 
speedup  is  more  than  a  factor  of  two. 

Notation.  For  an  n  x  n  matrix  A/  and  two  sets  I  and  J  of  subscripts,  we  let 
Mu  denote  the  submatrix  of  M  determined  by  the  rows  in  I  and  the  columns  in  J. 
As  a  special  case,  we  let  A//,  denote  the  submatrix  of  M  determined  by  the  rows  in 
/. 

We  let  G(A/)  denote  the  associated  directed  graph.  Here  edges  are  directed  from 
row  to  column:  i.e.,  ( r ,  c)  is  an  edge  in  G(M)  if  and  only  if  mrc  is  nonzero.  We  use 
\f 

the  notation  r  •J— >  c  to  indicate  the  existence  of  an  edge  from  r  to  c.  in  G(M),  and 
r  c  to  indicate  the  existence  of  a  path  from  r  to  c.  We  also  adopt  the  conventic 
that  i  ==>  i  for  any  i. 

2.  Unsymmetric  symbolic  factorization.  Let  A  be  a  sparse  unsvmmetr 
n  x  n  matrix  that  can  be  decomposed  (without  pivoting)  into  L  -  U,  where  L  is  lowt 
triangular  with  unit  diagonal  and  U  is  upper  triangular.  Let  F  denote  the  filled  matn 
L  +  U. 
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Assume  that  we  have  determined  the  nonzero  structures  of  the  first  k  -  1  rows  of 

L  and  U ;  i.c.,  letting  A'  =  { 1. ....  k  -  1 }  and  K  —  (  k . ;t}.  we  know  the  structure 

of 

Fk.  =  Lk .  +  Uk .  =  (  L^k  0  )  (  I'kk  (  ) 

The  following  result  relates  the  structures  of  the  rows  Lk.  and  l\.  to  the  existence 
of  certain  paths  in  GIUk.). 

Theorem  2.1  (see  [7]).  k  — •  i  if  and  only  if  k  — -  m  ••==£  /  for  some  rn. 

Thus,  to  determine  the  nonzero  structure  of  F\  .  —  Lk.  I  . .  we  can  search 
G(U\.)  for  nodes  reachable  from  some  node  m  for  which  atm  -  0. 

3.  Two  sparse  partial  pivoting  codes.  We  fo<  us  on  two  implementations  of 
sparse  LU  factorization  with  partial  pivoting:  Sherman  s  NSPFAC  (a  descendant  of 
NSPIV  (8|)  and  Gilbert  and  Peierls  s  code  7'  (referred  to  here  as  GP). 

NSPFAC  factors  .4  by  rows  using  column  partial  pivoting.  While  computing  Ft.. 
it  represents  the  structure  of  the  current,  partially  formed  row  oy  an  ordered,  linked 
list  of  subscripts  corresponding  to  nonzero  columns.  The  linked  list  is  initialized  to 
the  nonzero  columns  in  Ak.-  For  each  nonzero  Uj  (in  increasing  column  order |.  the 
structural  and  numeric  updates  from  U}.  to  l\.  are  applied  in  a  single  loop,  one 
element  at  a  time.  The  numeric  update  involves  two  levels  of  indirection. 

Gilbert  and  Peierls  [7]  observed  that  it  is  not  necessary  to  apply  the  row  updates 
in  increasing  order—  any  order  consistent  with  a  topological  order  of  G{L'kkI  would 
suffice.  They  also  noted  that  a  depth-first  search  of  Gl I'k.)  starting  from  the  nonzero 
columns  of  .4*..  gives  the  nonzero  structure  of  Ft-.,  and  that  a  topological  ordering 
can  be  obtained  as  a  byproduct,  without  additional  work.  Csing  this  result,  they 
show  that  GP  runs  in  time  proportional  to  the  number  of  floating-point  operations, 
a  property  not  shared  by  other  sparse  partial  pivoting  codes. 

In  computing  Ft-..1 2  GP  first  does  a  depth-first  search  to  compute  the  structure 
of  Lk.  (but  not  i't;  j  as  above.  Then,  for  each  nonzero  » t}  (in  topological  order),  it 
applies  the  structural  updates  from  Uj.  to  Uk.  and  the  numeric  updates  from  U}.  to 
Fk.  in  a  single  loop,  one  element  at  a  time. 

To  estimate  the  time  NSPFAC  and  GP  spend  in  uonnumeric  computations,  we 
wrote  a  sparse  LU  factorization  code  (called  NF)  that  uses  a  predetermined  pivot 
sequence  and  precomputed  factor  structures.*  By  using  the  same  pivot  sequence  and 
factor  structures  as  computed  by  NSPFAC  or  GP.  we  can  measure  how  much  time 
would  be  spent  if  the  nonnumeric  operations  involving  symbolic  factorization  and 
pivot  selection  were  removed. 

Table  2  gives  the  run-times3  for  ten  problems  from  the  Harwell  Boeing  collection 
[3].  For  each  test  matrix  A.  the  rows  of  the  matrix  were  preordcred  bv  a  minimum 
degree  ordering  of  .4.4‘,  as  suggested  by  George  and  Ng  [5].  The  results  for  the  Sun 
SparcStation/1  show  that  the  nonnumeric  overhead  can  exceed  50  percent.  For  the 

1  Although  GP  computes  the  LU  factorization  by  columns  using  row  partial  pivoting,  to  be 
consistent  we  describe  the  Gilbert-Peieris  approach  by  rows.  In  the  numerical  experiments.  GP 
factored  A‘  rather  than  A. 

2NSPFAC  scales  rows  by  multiplying  by  the  reciprocal  of  the  pivot:  GP  scales  columns  by  dividing 
by  the  pivot.  To  make  the  comparisons  fair,  we  used  two  versions  of  NF. 

3 All  programs  were  written  in  Fortran;  use  double-precision  arithmetic,  and  were  compiled  with 
optimization  enabled  (f77  -0  (SC1.0  Fortran  VI. 4)  on  the  SparcStation/1.  xlf  -0  (XL  FORTRAN 
Compiler/6000  Version  2.2)  on  the  RS/6000). 
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Table  1 

Nonzeros  tn  ortq’noi  and  filled  matrices. 


Problem 

n 

nz(A) 

az(Ffjsp) 

nz(F0p) 

GEMAT11 

4929 

33185 

79774 

79757 

JPWH991 

991 

6027 

134741 

131502 

LNS3937 

3937 

25407 

403017 

403520 

LNSP3937 

3937 

25407 

383313 

383340 

MCFE 

765 

24382 

68288 

68288 

ORANI678 

2529 

90158 

262250 

262365 

ORSREG1 

2205 

14133 

374957 

374957 

SAYLR4 

3564 

22316 

624742 

624742 

SHERMAN3 

5005 

20033 

409475 

409475 

SHERMANS 

3312 

20793 

242556 

242556 

Table  2 

Time  ( tn  seconds)  for  NSPFAC/GP  and  NF  unth  the  same  ptuot  sequence. 


Problem 

SparcStation/ 1 

i  RS/6000 

NSP 

NF 

GP 

NF 

NSP 

NF 

GP 

NF 

GEM  ATI  1 

2.61 

1.44 

3.23 

1.59 

1.75 

0.48 

1.98 

0.50 

JPWH991 

29.79 

17.75 

32.54 

18.75 

19.53 

5.20 

18.97 

5.10 

LNS3937 

68.58 

.39.76 

73.79 

42.86 

43.27 

11.88 

45.02 

12.38 

LNSP3937 

63.34 

35.16 

65.42 

37.92 

38.38 

10.58 

39.77 

10.98 

MCFE 

7.18 

4.16 

7.89 

4.63 

4.83 

1.28 

4  65 

1  32 

ORANI678 

32.17 

15.64 

29.41 

16.81 

21.82 

4.87 

17.78 

5.13 

ORSREG1 

92.43 

56.26 

103.94 

60.40 

60.23 

16.33 

62.20 

17.33 

SAYLR4 

168.39 

102.90 

189.65 

110.18 

110.07 

29.80 

118.82 

30.78 

SHERMAN3 

97.31 

58.60 

107.58 

62.70 

62.88 

17.10 

65.18 

17.32 

SHERMAN5 

42.50 

26.01 

47.98 

27.88 

27.90 

7.73 

29.25 

7.92 

IBM  RS/6000  Model  320.  which  has  relatively  faster  (with  respect  to  the  speed  of  its 
integer  unit)  floating-point  hardware,  the  nonnumeric  overhead  can  exceed  70  percent. 

4.  Symmetric  reduction.  Theorem  2.1  characterizes  the  nonzero  structure  of 
Fk,  in  terms  of  the  structure  of  Ak.  and  paths  in  the  graph  G(Uk .)•  But  by  removing 
from  G(Uk «)  edges  that  are  not  needed  to  preserve  the  set  of  paths,  a  process  called 
transitive  reduction  [1],  we  can  decrease  the  amount  of  searching  required  to  determine 
the  structure. 

If  we  remove  all  such  redundant  edges,  then  we  get  the  elimination  dag  (directed 
acyclic  graph)  [6],  the  minimal  subgraph  that  preserves  paths.  However,  if  we  remove 
fewer  redundant  edges,  we  will  still  preserve  the  set  of  paths.  The  search  time  will 
be  larger  than  for  the  elimination  dag,  but  the  total  time  (including  the  time  for  the 
reduction)  may  be  less. 

Symmetric  reduction  [4]  is  based  on  structural  symmetry  in  the  filled  matrix  F. 
The  symmetric  reduction  of  G(f/>c.)  is  obtained  by  deleting  all  edges  iu  m)  for  which 
*  Uij  i=  0  for  some  j  <  minjfc.  m}.  In  effect,  all  nonzeros  to  the  right  of  the  first 
symmetric  nonzero  are  deleted;  if  no  such  symmetric  nonzero  exists,  then  all  nonzero 
entries  axe  kept.  We  denote  the  resulting  symmetrically  reduced  matrix  by  Uk,. 

Figure  1  shows  the  structures  of  two  partial  factor  matrices  FKt.,  and  Fk6> > 
where  K4  —  {1,2, 3,4}  and  Ks  —  (1,2, 3, 4, 5}.  We  use  to  indicate  a  nonzero 
entry  in  the  original  matrix,  and  “o”  an  entry  that  fills  in.  Since  £ 41  *  um  is  the  only 
symmetric  nonzero  pair  in  Fk4,»,  only  the  nonzeros  to  the  right  of  U14  are  pruned 
from  t//c4,«  to  get  Uk,*-  On  the  other  hand,  there  are  two  more  symmetric  nonzero 
pairs  in  F/f5i„,  £52  *  «25  and  £54  *  U45,  so  that  nonzeros  are  pruned  in  rows  2  and  4  to 
get 
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/  1  .  .  .  ^ 

1  i 

(  l 

.  \ 

Fk,,-  = 

2  •  • 

•  3  o  o  o 

*> 

• 

li 

2  •  • 

3  o  o  o 

^  •  4  o  •  o  / 

i  ' 

4  o  «  o  f 

fir...  = 


1  *  *  • 

1  *  \ 

2  •  • 

2  • 

•  3  o  o  o 

Fk,.-  = 

3  o  o  o 

• 

o 

• 

0 

1  - 

0 

o 

o 

• 

• 

l  r,  o  o  i 

Fig.  1.  An  example  to  diusirafr  symmetric  reduction. 


Table  3 

Normalized  time  for  the  original  and  two  modified  versions  of  NSPFAC/GP. 


mm 

SparcStation/ 1 

IBM  RS/6000 

NSP 

Red 

Mod 

GP 

Red 

Mod 

NSP 

Red 

Mod 

GP 

Red 

Mod 

1.81 

1.76 

1.56 

2.03 

1.69 

1.64 

3.65 

2.50 

2.29 

3.96 

2.64 

2.50 

1.68 

1.27 

1.09 

1.74 

1.09 

1.09 

3.76 

1.58 

1,19 

3.72 

1.24 

1.21 

H 

1.72 

1.30 

1.12 

1.72 

1.12 

1.12 

3.64 

1.67 

1.27 

3.64 

1.36 

1.33 

1.80 

1.32 

1.12 

1.73 

1.14 

1.13 

3.63 

1.68 

1  29 

3  62 

1  36 

1.33 

MCFE 

1.73 

1.39 

1.21 

1.70 

1.19 

1.17 

3.77 

1.86 

1 .43 

3.52 

1.42 

1.38 

ORANI678 

2.06 

1.79 

■titl 

1.75 

1.25 

1.21 

4.48 

2.86 

2.51 

3  47 

1.62 

1.49 

ORSREG1 

1.64 

1.27 

1.07 

1.72 

1.08 

1.08 

3.69 

1.58 

1.17 

3  59 

1.20 

1.18 

SAYLR4 

1.64 

1.26 

1.07 

1.72 

'110  if* 

1.07 

3.69 

1.59 

1.20 

3.86 

1.26 

1  24 

SHERMAN3 

1.66 

1.08 

1.72 

ilrol 

IliU 

3.68 

1.59 

1.18 

3.76 

1.27 

1.25 

SHERMAN5 

1.63 

1.29 

1.09 

1.72 

l.n 

1.11 

3.61 

1.63 

1.23 

3.69 

1.33 

1.28 

Harmonic  Mean 

1.73 

1.37 

1.17 

1.75 

1.16 

1.15 

3.75 

1.78 

1.37 

3.68 

1  40 

1.36 

Symmetric  reduction  preserves  the  set  of  paths  in  G(U)  (see  [4]).  The  argument 
can  be  adapted  to  show  that  it  also  preserves  the  set  of  paths  in  G(Uk « )•  The  following 
result  is  an  immediate  corollary  of  this  observation  and  Theorem  2.1. 

COROLLARY  4.1.  k  — ►  f  if  and  only  if  k  — m  ==5  i  for  some  m. 

5.  Numerical  experiments.  We  incorporated  symmetric  reduction  into  NSP- 
FAC  and  GP.  In  the  process,  we  made  a  number  of  small  modifications  to  the  codes. 

In  NSPFAC,  we  split  the  innermost  loop  so  that,  when  applying  the  update  from 
Uj,  to  Fk.,  we  complete  the  structural  update  before  performing  the  numeric  update. 
Furthermore,  we  removed  one  of  the  two  levels  of  indirection  from  the  numeric  update. 

In  GP,  we  removed  the  structural  update  to  Uk-  from  the  innermost  loop  and 
disabled  the  test  for  accidental  cancellation,  for  otherwise  symmetric  reduction  might 
not  preserve  paths.  Furthermore,  we  combined  the  symbolic  computation  of  Lk-  and 
Uk»  into  a  single  depth-first  search  that  computes  the  structure  of  Fk.  using  Corollary 
4.1. 

Table  3  presents  the  ratios  of  the  run-times  of  the  original  and  two  modified  ver¬ 
sions  of  NSPFAC  and  GP  to  the  corresponding  NF  using  the  same  pivot  sequence. 
The  versions  labeled  “Red”  include  only  those  changes  needed  to  incorporate  symmet¬ 
ric  reduction;  the  versions  labeled  “Mod”  also  include  the  changes  that  remove  one 
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level  of  indirection  (NSPFAC)  or  combine  the  depth-first  searches  (GP).  As  in  Table 
2,  the  rows  of  each  test  matrix  A  were  preordered  by  a  minimum  degree  ordering  on 
A  A1. 

The  results  show  a  dramatic  decrease  in  the  overall  factorization  time.  The  re¬ 
duction  is  more  pronounced  on  the  RS/6000  due  to  the  relatively  faster  floating-point 
hardware.  An  even  more  dramatic  reduction  would  be  expected  on  a  vector  processor. 

There  are  other  wavs  to  improve  these  sparse  partial  pivoting  codes.  One  is  to 
use  path-symmetric  or  partial  path-symmetric  reduction,  as  described  in  I4j.  Another 
is  to  switch  from  nodal  to  supernodal  elimination  [2J,  which  we  expect  will  give  a 
substantial  improvement.  A  code  with  these  features  is  currently  under  development 
by  the  authors. 

Acknowledgment.  The  authors  thank  John  Gilbert  for  making  available  a  pre¬ 
release  of  sparse  Matlab,  which  was  used  to  generate  the  row  orderings  for  the  test 
problems,  and  for  suggesting  the  notation  used  for  edges  and  paths. 
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